Yifan Zhang, yifan.zhang@asu.edu
Xing Liang, xliang22@asu.edu
Michael Steptoe, msteptoe@asu.edu
Sagarika Kadambi, sskadamb@asu.edu
Wei Luo, wluo23@asu.edu
Dawei Zhou, dawei.zhou@asu.edu
Hanghang Tong, htong6@asu.edu
Jingrui He, jingruih@asu.edu
Ross Maciejewski, rmacieje@asu.edu
Student Team: Yes
Did you use data from both mini-challenges? Yes
https://www-complexnetworks.lip6.fr/~latapy/PP/walktrap.html
Approximately how many hours were spent
working on this submission in total?
240 hours between the students
May we post your submission in the
Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? Yes
Video Download
Video:https://youtu.be/VGXrxlxstt4
Questions
MC2.1 – Identify those IDs that stand out for their large volumes of
communication. For each of these IDs
a. Characterize the communication patterns you see.
b. Based on these patterns, what do you hypothesize about these
IDs?
Limit your response
to no more than 4 images and 300 words.
We used a calendar view (16 hours vertically x 60 minutes horizontally) and histogram to identify IDs with large volumes of communications. We categorize the volume of communications into four types: sent, received, external, and unique (i.e., if a visitor sends multiple communication at the same time, it will only count once in the unique volume). We can explore the sent message histogram and interactively select bins that have a high volume of messages to extract the IDs. We find two IDs having high volumes of sent message: 1278894, 839736
ID:1278894 has more than 189000 sent and received messages; however, the unique volume for this ID is only 180 and it did not have any external communications. By looking at the calendar view of this ID, we identify that it has high volume of communication every five minutes starting at 12PM and ending at 21:00 every day. In this case, we hypothesize that this ID is a park auto message bot that broadcasts announcements.
ID:839736 has almost identical sent and received message (over 60000) and has 0 external communications but also a large amount of unique communications. Its calendar view does not indicate any obvious temporal patterns, so we hypothesize that this ID is a park ID that replies to visitors like an information service center.
After excluding these two IDs, we look into the other top IDs that have high amounts of communications.
These 10 IDs have over 34 sent messages (more than received) and have very few external communications. Their unique volume is one tenth of the sent volume. The calendar view for those IDs show that the high volume of communications are not smoothly distributed over time but burst out at certain time (ID:1045021) and all of them were in park all three days. Therefore, we hypothesize that those IDs are likely park staff.
MC2.2 – Describe up to 10 communications patterns in the data. Characterize who
is communicating, with whom, when and where. If you have more than 10 patterns
to report, please prioritize those patterns that are most likely to relate to
the crime.
Limit your response
to no more than 10 images and 1000 words.
For this question, we linked the patron ID to the movement
data and then marked the communication from the closest attraction in the
park. These locations are then marked in
a calendar view and the number of To/From/External/Unique calls are shaded by
magnitude in each cell. We explored
these locations to identify typical call patterns related to rides in the park.
Pattern 1: Messages
at attraction 32-Creighton Pavilion and 63-Grinosaurus Stage.
The image below shows the communication peaks at attraction 32 and 63 on Friday. The peaks at attraction 63 happened between 09:30 am to 11:30 am and between 14:30 and 16:30, whereas the communication peaks at attraction 32 happened between 11:30 and 15:00. From this pattern, we can observe when the soccer star’s show is over. We can also infer that the pavilion is closed from 10:00 to 11:30 and from 15:00 to 16:30 as few communications are sent from there during these times.
Figure 1: Distribution of
sent messages on Friday 8AM to 24:00.
Note that this pattern changes on Sunday. There is only one communication peak at attraction 32 and attraction 63, which is different from the previous two days. The show at attraction 63 and display at attraction 32 are inferred to be canceled in the afternoon according to the communication amount over there. We speculate that it is the vandalism that causes the stop of the show, followed by less communications.
Figure 2:: Distribution of
sent messages on Sunday from 8AM to 24:00.
Pattern 2: Park
broadcasts.
We observe that some IDs broadcast large amounts of messages over a short period of time while others send messages intermittently. We hypothesize that the park uses a broadcast system to send messages to many patrons simultaneously. We assume that all messages sent by one ID within one minute are the same message and we provide a count of all unique messages in the data. We can view the distribution of unique messages at each location using the calendar view and can observe the high volume of sent/received messages as well as the large amount of unique messages being sent. In the figure below, we show the total messages sent on Sunday and the right is the unique messages sent. The unique messages sent at the pavilion are of the same magnitude as the total number of messages which means that at this time it is the visitors who sending the message which likely corresponds to the discovery of the vandalism.
Figure 3: Comparisons of the
sent (left) and unique (right) messages.
We also support visualizing the number of messages sent to people external to the park. Wecan see that the external amount of the messages at 12:00 to 12:30PMalso peak at this time, further indicating the vandalism was discovered
Figure 4: Comparisons of the
external message amount.
Pattern 3: Criminal identification
Since we suspect the crime occurred at the Pavilion between 10:30 and 11:30AM, we explore the communication patterns of people located near the pavillion during this time. First, we use the histogram to visualize those people who have external communications over that time. We can get a list of visitors ids: 461004,416790,1502920,611447,771453,416790,668872,921888,988181,1101361,1102394,1358860,1364488,1872848,1938686.
Then we use our communication explorer tool (below) to explore those ids and found that three ids (461004,416790,1502920) have abnormal behaviors. They came into the park together and then went to the Pavilion around 9:00am. Over the next 30 minutes, they stayed there with few communications. However, they had a lot of communications after 09:40AM, which was very different from the previous 30 minutes until 12 pm.
Figure 5: Visualization for
the visitors’ changed checked in attraction and communications. In this view,
the x axis represents the time from 08:00 to 24:00 and the Y axis represents
the attractions that the visitors are near. Circles in the plot represent the
movement of the visitors from one attraction to another and the circle size
encodes the number of patrons. A circle
consists only of patrons that have a connection in the communication data. In
the lower left, we can see a small black circle show up at attraction 84, which
means there are 1-2 people checked-in at that time. After a while, this group
moved to attraction 81, a green line is drawn to emphasize his movement. By
hovering over the line we show if patrons joined or left this group. If the circle changes to red, this means that
a person left the group at this attraction and if it changes to blue, it means
that a person joined the group. Blue short lines represent the internal
communications among the visitors in one attractions and the red short lines
represent their external communications.
In this way we hope to explore how patrons that communicate with each
other travel around the park.
To further trace the movement of the group (461004,416790,1502920), we visualize their attraction records and communication in the same graph. We can clearly see that most of time, they visit the same attractions and barely separate.
Figure 6: Group (461004,416790,1502920)’s check-ins
and communications.
Another suspect is visitor 921888 who checked in at the Pavilion right after group (461004,416790,1502920), Figure 5. From the records, it is the first day that this patron came to the park, and he/she returned to the hotel at 12:10pm, Figure 7.
Figure 7: Visitor 921888’s
check-ins and communications.
Pattern 4: High
external call volumes
We also observed patterns
of high external call volumes.
Figure 8:
Histogram for the total external amount excluding the broadcasting IDs.
We explore this patron’s
movement and communication pattern.We find that this person has a large amount
of external messages at the Pavillion even though the Pavilion is closed.
Figure 9:
Attraction and communication situation for the visitor 1711922.
Pattern 5: Visitor preferences
The figure below shows
that there were more people sending messages out from the thrill rides, which
implies that these rides have a large popularity.
Figure 10:
The distribution of sending message amount on Sunday.
MC2.3 –
From this data, can you hypothesize when
the crime was discovered? Describe your
rationale.
Limit your
response to no more than 3 images and 300 words.
From the data, we hypothesize that the crime was discovered on Sunday between 11:30 and 11:45AM. We found that all four types of communications (Sent, Received, External and Unique) around the Pavilion (attraction 32) on Friday and Saturday display a strong temporal pattern with high volume peaks from 8:30-10:00, 11:30-15:00, and 16:30-21:00 in the attraction calendar view. On Sunday, the first pattern from 8:30 to 10:00 existed, but the second pattern from 11:30 to 15:00 changed and the third pattern was gone.
The sent volume is a few thousand messages from 11:30-12:00 on Friday and Saturday at Pavilion, but the volume reaches more than 29000 during that time on Sunday. We can also see that the volume of communication decreases significantly after 12:30 on Sunday and we hypothesize that the crime is discovered between 11:30 and 12:00. Next we investigate the detail view of the communications from 11:30-12:00 by clicking that cell in the calendar. A set of circles are displayed above the calendar, each colored based on the number of communications per minute during this half hour block. We can see that the external volume has a clear division at 11:45am. We hypothesize that at 11:38AM the crime was discovered by the staff and several announcements are broadcast, which result in an increased volume in sent and received messages at 11:39, 11: 41, and 11:44. After that, visitors start messaging their friends and staff may contact the authorities, which result in a significant increasing external volume at 11:45.